UniNE at CLEF 2009: Persian Ad Hoc Retrieval and IP
نویسندگان
چکیده
This paper describes the participation of the University of Neuchâtel to the CLEF 2008 evaluation campaign. In the Persian ad hoc task, we suggest using a light suffixstripping algorithm for the Farsi language and the evaluations demonstrated that such an approach performs better than a simple light stemmer, an approach ignoring the stemming stage or a language independent approach (n-gram). The use of a blind query expansion (e.g., Rocchio’s model) may improve the retrieval effectiveness. Combining different indexing and search strategies may further enhance the corresponding MAP. In the Intellectual Property (IP) task, we try different strategies to select and weight pertinent words to be extracted from a patent description in order to form an effective query. We also evaluated different search models and found that probabilistic models tend to perform better than vector-space schemes.
منابع مشابه
German, French, English and Persian Retrieval Experiments at CLEF 2009
We describe evaluation experiments conducted by submitting retrieval runs for the monolingual German, French, English and Persian (Farsi) information retrieval tasks of the Ad Hoc Track of the Cross-Language Evaluation Forum (CLEF) 2009. In the ad hoc retrieval tasks, the system was given 50 natural language queries, and the goal was to find all of the relevant records or documents (with high p...
متن کاملJHU Experiments in Monolingual Farsi Document Retrieval at CLEF 2009
At CLEF 2009 JHU submitted runs in the ad hoc track for the monolingual Persian evaluation. Variants of character n-gram tokenization provided a 10% relative gain over unnormalized words. A run based on skip n-grams, which allow internal skipped letters, achieved a mean average precision of 0.4938. Using traditional 5-grams resulted in a score of 0.4868 while plain words had a score of 0.4463.
متن کاملAd Hoc Retrieval with the Persian Language
This paper describes our participation to the Persian ad hoc search during the CLEF 2009 evaluation campaign. In this task, we suggest using a light suffix-stripping algorithm for the Farsi (or Persian) language. The evaluations based on different probabilistic models demonstrated that our stemming approach performs better than a stemmer removing only the plural suffixes, or statistically bette...
متن کاملAd Hoc Information Retrieval for Persian
In this paper we present an introduction to the Persian language and its morphology, and describe available resources for Persian text processing. We then propose and evaluate an information retrieval model, a variation of the vector space model which uses the relations existing between query terms. Our experiments on the Hamshahri collection show that the proposed model has better precision fo...
متن کاملCLEF 2009 Ad Hoc Track Overview: TEL and Persian Tasks
The 2009 Ad Hoc track was to a large extent a repetition of last year’s track, with the same three tasks: Tel@CLEF, Persian@CLEF, and Robust-WSD. In this first of the two track overviews, we describe the objectives and results of the TEL and Persian tasks and provide some statistical analyses.
متن کامل